Asymptotic Theory for Random Forests

نویسندگان

  • Stefan Wager
  • S. WAGER
چکیده

Random forests have proven to be reliable predictive algorithms in many application areas. Not much is known, however, about the statistical properties of random forests. Several authors have established conditions under which their predictions are consistent, but these results do not provide practical estimates of random forest errors. In this paper, we analyze a random forest model based on subsampling, and show that random forest predictions are asymptotically normal provided that the subsample size s scales as s(n)/n = o(log(n)−d), where n is the number of training examples and d is the number of features. Moreover, we show that the asymptotic variance can consistently be estimated using an infinitesimal jackknife for bagged ensembles recently proposed by Efron (2014). In other words, our results let us both characterize and estimate the errordistribution of random forest predictions, thus taking a step towards making random forests tools for statistical inference instead of just black-box predictive algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the asymptotics of random forests

The last decade has witnessed a growing interest in random forest models which are recognized to exhibit good practical performance, especially in high-dimensional settings. On the theoretical side, however, their predictive power remains largely unexplained, thereby creating a gap between theory and practice. The aim of this paper is twofold. Firstly, we provide theoretical guarantees to link ...

متن کامل

Generalized Random Forests

We propose generalized random forests, a method for non-parametric statistical estimation based on random forests (Breiman, 2001) that can be used to fit any quantity of interest identified as the solution to a set of local moment equations. Following the literature on local maximum likelihood estimation, our method operates at a particular point in covariate space by considering a weighted set...

متن کامل

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests∗

Many scientific and engineering challenges—ranging from personalized medicine to customized marketing recommendations—require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman’s widely used random forest algorithm. Given a potential outcomes framework with unconfoundedn...

متن کامل

Random Forests and Adaptive Nearest Neighbors

In this paper we study random forests through their connection with a new framework of adaptive nearest neighbor methods. We first introduce a concept of potential nearest neighbors (k-PNN’s) and show that random forests can be seen as adaptively weighted k-PNN methods. Various aspects of random forests are then studied from this perspective. We investigate the effect of terminal node sizes and...

متن کامل

Asymptotic Behavior of Weighted Sums of Weakly Negative Dependent Random Variables

Let be a sequence of weakly negative dependent (denoted by, WND) random variables with common distribution function F and let be other sequence of positive random variables independent of and for some and for all . In this paper, we study the asymptotic behavior of the tail probabilities of the maximum, weighted sums, randomly weighted sums and randomly indexed weighted sums of heavy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016